Parametrised Hausdorff Distance as a Non-Metric Similarity Model for Tandem Mass Spectrometry

نویسندگان

  • Jiri Novák
  • David Hoksza
چکیده

Tandem mass spectrometry is a widely used method for protein and peptide sequences identification. Since the mass spectra contain up to 80% of noise and many other inaccuracies, there still exists a need for more accurate algorithms for mass spectra interpretation. The sizes of protein databases grow rapidly and the methods for indexing these databases in order to interpret mass spectra become very popular. The parametrised Hausdorff distance, suitable for non-metric search, is presented in this paper. It models the similarity among tandem mass spectra very well and it is able to match the spectrum to correct peptide sequence in many cases without any post-processing scoring system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sponsored by

Tandem mass spectrometry is a widely used method for protein and peptide sequences identification. Since the mass spectra contain up to 80% of noise and many other inaccuracies, there still exists a need for more accurate algorithms for mass spectra interpretation. The sizes of protein databases grow rapidly and the methods for indexing these databases in order to interpret mass spectra become ...

متن کامل

Non-metric similarity search of tandem mass spectra including posttranslational modifications

In biological applications, the tandem mass spectrometry is a widely used method for determining protein and peptide sequences from an ”in vitro” sample. The sequences are not determined directly, but they must be interpreted from the mass spectra, which is the output of the mass spectrometer. This work is focused on a similarity-search approach to mass spectra interpretation, where the paramet...

متن کامل

On Comparison of SimTandem with State-of-the-Art Peptide Identification Tools, Efficiency of Precursor Mass Filter and Dealing with Variable Modifications

The similarity search in theoretical mass spectra generated from protein sequence databases is a widely accepted approach for identification of peptides from query mass spectra produced by shotgun proteomics. Growing protein sequence databases and noisy query spectra demand database indexing techniques and better similarity measures for the comparison of theoretical spectra against query spectr...

متن کامل

Riemannian manifolds , spaces of measures and the Gromov - Hausdorff distance ∗

We equip the space M(X) of all Borel probability measures an a compact Riemannian manifold X with a canonical distance function which induces the weak-∗ topology on M(X) and has the following property: the map X 7→ M(X) is Lipschitz continous with respect to the Gromov-Hausdorff distance on the space of all the (isometry classes of) compact metric spaces. Introduction Last century brought sever...

متن کامل

Mining Mass Spectra: Metric Embeddings and Fast Near Neighbor Search

Mining large-scale high-throughput tandem mass spectrometry data sets is a very important problem in mass spectrometry based protein identification. One of the fundamental problems in large scale mining of spectra is to design appropriate metrics and algorithms to avoid all-pair-wise comparisons of spectra. In this paper, we present a general framework based on vector spaces to avoid pair-wise ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010